Split up yaml loaders into multiple files #23774

ties · 2019-05-09T10:28:33Z

Breaking Change:

Some implementation details of homeassistant.util.yaml are no longer public (their path changed). Some of these may not be technically internal. All exports from homeassistant.util.yaml that were accesses from outside the tests are still available using their original name. See commit messages for exact details.

API changes:

homessistant.util.yaml.represent_odict is not exposed as part of homeassistant.util.yaml but can be accessed from homeassistant.util.yaml.dumper.represent_odict
homeasistant.util.yaml.{NodeListClass, NodeStrClass} are not exposed as parts of `homeassistant.util.
homeassistant.util.yaml does not longer export imported (and monkeypatched) yaml module

Note that NodeStrClass does not appear to be actually used because the condition in yaml.py#L82 never matches - this is seen in test coverage.

Description:

homeassistant.util.yaml has been split up in preparation for more cleanup, following the structure from Ansible.

I intended to add a feature to the yaml loader to expand environment variables in strings (such as in docker-compose.yml). When considering this I noticed that the homeassistant.util.yaml was one large monolithic file that monkey patches system yaml.

Steps

The first improvement I would like to propose is contained in this PR and splits up the yaml code, following the structure in Ansible where the custom yaml loaders are split up into multiple files with clear roles.

A second step would be to no longer monkey patch yaml.yaml but to create a custom SafeLoader such as in Ansible. Since this is an important change I would propose to do this in a separate iteration.

A (optional) third step would be to add "expand environment variables in strings" to make values (e.g. database connection strings) configurable using environment variables that are expanded using the common ${VAR} syntax instead of !!env and templating the value outside of the configuration. Or to use jinja2 for strings in the configuration such as happens in ansible.

Checklist:

The code change is tested and works locally.
Local tests pass with tox. Your PR cannot be merged unless tests pass
There is no commented out code in this PR.
I have followed the development checklist

If user exposed functionality or configuration variables are added/changed:

Documentation added/updated in home-assistant.io

If the code communicates with devices, web services, or third-party tools:

The manifest file has all fields filled out correctly. Update and include derived files by running python3 -m script.hassfest.
New or updated dependencies have been added to requirements_all.txt by running python3 -m script.gen_requirements_all.
Untested files have been added to .coveragerc.

If the code does not interact with devices:

Tests have been added to verify that the new code works.

Move parts of yaml loader out of the single large file and start to create the structure of the yaml loaders in Ansible [0]. [0]: https://github.com/ansible/ansible/tree/devel/lib/ansible/parsing/yaml

* Move code around to finish the migration * Update the mocks so that `open` is patched in `homeassistant.util.yaml.loader` instead of `homeassistant.util.yaml`. * Updated mypy ignores * Updated external API of `homeasistant.util.yaml`, see below: Checked what part of the api of `homeassistant.util.yaml` was actually called from outside the tests and added an `__ALL__` that contains only these elements. Updated the tests so that references to internal parts of the API (e.g. the yaml module imported into `homeassistant.util.yaml.loader`) are referenced directly from `homeassistant.util.yaml.loader`. In `tests/test_yaml.py` the import `yaml` refers to `homeassistant.util.yaml` and `yaml_loader` refers to `~.loader`. Future work that remains for the next iteration is to create a custom SafeConstructor and refers to that instead of monkey patching `yaml` with custom loaders.

ties · 2019-05-09T10:42:09Z

I think this kind of a tricky PR but think the code gets a lot more structured because of it. It is a purely structural change.

I realized that I did not check other test cases (i.e. test/components for mocks that need to be updated, will do that over lunch) and reopen later.

awarecan · 2019-05-09T13:36:06Z

FYI, there is an open PR #23761 to update PyYaml, you may want to coordinate with @BKPepe

Another reminder, our ultimate goal is to replace PyYaml by ruamel.yaml

ties · 2019-05-09T13:46:13Z

a heads up: Part of the complete test suite is currently unstable on my laptop and a Debian Buster machine so I am relying on circleci to give the heads up that all is ok.

ties · 2019-05-09T14:18:14Z

FYI, there is an open PR #23761 to update PyYaml, you may want to coordinate with @BKPepe

I'll follow that one, thanks for the heads up! I think that I caught all the places directly using homeassistant.util.yaml.yaml.load in one of these commits, that could be beneficial for that PR as well.

Another reminder, our ultimate goal is to replace PyYaml by ruamel.yaml

I was aware of that and think this work is orthogonal to that. If this restructuring (and the structure as in Ansible) is actually clearer to other developers.

Is there a way to add telemetry so we can see how often these special loaders are actually used? In my opinion some of them seem to have niche use cases.

OttoWinter · 2019-05-09T14:41:46Z

A (optional) third step would be to add "expand environment variables in strings" to make values (e.g. database connection strings) configurable using environment variables [...]

I don't think we should expand environment variables in all strings in the config - we shouldn't change the value of any YAML object at all without the user intentionally requesting to do so with a YAML tag.

Reason is: There are probably many real use-cases (like shell script component) where they should not be expanded. If we would expand them, that might be a very un-intuitive annoyance. Plus it would be a bad breaking change. Plus I don't see why !env is bad - yes it's verbose but that's good!

Another option would be a system esphome uses: substitutions (https://esphome.io/guides/configuration-types.html#config-substitutions) - only keys that are explicitly defined by the user are expanded. Everything else that looks like a substitution but is not defined as one, is not expanded.

Or to use jinja2 for strings

Problem is HA templates also use jinja2 - if we evaluate jinja2 at load time it would fundamentally mess with HA templating.

If you want to pursue it further, I would suggest opening an architecture issue (https://github.com/home-assistant/architecture).

balloob

This clean up looks great, thanks !

I do agree that adding further substitutions to our YAML should be treated with care, as there are many follow-up parts of the code that parse the YAML as templates.

ties · 2019-05-13T18:20:38Z

I was away for the weekend so this comes a bit late - but I'll expand on the environment variables.

I don't think we should expand environment variables in all strings in the config - we shouldn't change the value of any YAML object at all without the user intentionally requesting to do so with a YAML tag.

Reason is: There are probably many real use-cases (like shell script component) where they should not be expanded. If we would expand them, that might be a very un-intuitive annoyance. Plus it would be a bad breaking change. Plus I don't see why !env is bad - yes it's verbose but that's good!

Thanks for the quick feedback and the explanation of your considerations!

It is a product change I would definitely not consider before having access to telemetry. Changing semantics without a good reason is annoying. I would only consider this for single line strings and only if it conflicts in a limited number of cases.

The use case is the general "configure through environment variables" pattern that is the part of the 12 factor app pattern that I appreciate.

My use case is building a connection string to influxdb and mysql from the environment variables that docker compose creates. This needs multiple tokens (postgresql://${POSTGRES_USER}:${POSTGRES_PASSWORD}@hass-postgres/${POSTGRESS_DB}) which !env_var does not support. os.path.expandvars could be coerced into doing what I want.

If you want to pursue it further, I would suggest opening an architecture issue (https://github.com/home-assistant/architecture).

That is the way. There are many paths to take (adding a (version) tag to the yaml, only applying to single line strings, only apply to some type of multiline quoting, only apply within quotes, etc). For now I hardcoded the connection string in my config 🙃

* Start moving parts of yaml utils to own module Move parts of yaml loader out of the single large file and start to create the structure of the yaml loaders in Ansible [0]. [0]: https://github.com/ansible/ansible/tree/devel/lib/ansible/parsing/yaml * Finish yaml migration, update tests and mocks * Move code around to finish the migration * Update the mocks so that `open` is patched in `homeassistant.util.yaml.loader` instead of `homeassistant.util.yaml`. * Updated mypy ignores * Updated external API of `homeasistant.util.yaml`, see below: Checked what part of the api of `homeassistant.util.yaml` was actually called from outside the tests and added an `__ALL__` that contains only these elements. Updated the tests so that references to internal parts of the API (e.g. the yaml module imported into `homeassistant.util.yaml.loader`) are referenced directly from `homeassistant.util.yaml.loader`. In `tests/test_yaml.py` the import `yaml` refers to `homeassistant.util.yaml` and `yaml_loader` refers to `~.loader`. Future work that remains for the next iteration is to create a custom SafeConstructor and refers to that instead of monkey patching `yaml` with custom loaders. * Update mocks in yaml dumper, check_config

Start moving parts of yaml utils to own module

469fa36

Move parts of yaml loader out of the single large file and start to create the structure of the yaml loaders in Ansible [0]. [0]: https://github.com/ansible/ansible/tree/devel/lib/ansible/parsing/yaml

ties requested a review from a team as a code owner May 9, 2019 10:28

homeassistant added cla-signed core labels May 9, 2019

ties force-pushed the feature/refactor_yaml branch from 684cd42 to a593603 Compare May 9, 2019 10:36

ties closed this May 9, 2019

Update mocks in yaml dumper, check_config

c0b22e9

ties reopened this May 9, 2019

ties requested a review from kellerza as a code owner May 9, 2019 13:03

ties mentioned this pull request May 9, 2019

Update PyYAML to 5.1 #23761

Closed

5 tasks

balloob approved these changes May 9, 2019

View reviewed changes

balloob merged commit 4004867 into home-assistant:dev May 9, 2019

ties deleted the feature/refactor_yaml branch May 9, 2019 20:42

balloob mentioned this pull request Jun 4, 2019

0.94.0 #24305

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split up yaml loaders into multiple files #23774

Split up yaml loaders into multiple files #23774

ties commented May 9, 2019 •

edited

ties commented May 9, 2019 •

edited

awarecan commented May 9, 2019

ties commented May 9, 2019

ties commented May 9, 2019

OttoWinter commented May 9, 2019

balloob left a comment

ties commented May 13, 2019 •

edited

Split up yaml loaders into multiple files #23774

Split up yaml loaders into multiple files #23774

Conversation

ties commented May 9, 2019 • edited

Breaking Change:

Description:

Steps

Checklist:

ties commented May 9, 2019 • edited

awarecan commented May 9, 2019

ties commented May 9, 2019

ties commented May 9, 2019

OttoWinter commented May 9, 2019

balloob left a comment

Choose a reason for hiding this comment

ties commented May 13, 2019 • edited

ties commented May 9, 2019 •

edited

ties commented May 9, 2019 •

edited

ties commented May 13, 2019 •

edited